The primary tool used was Middlesex
University’s INteractive VIsual
Search and QUery Environment (INVISQUE). The Invisque user interface (UI), which is written in Adobe
Flash, is supported by a middleware written in Java that queried the MC3
dataset as stored in a MySQL database.
In addition, University of Leeds
used a python based implementation of their corpus analysis algorithm to
generate log likelihood statistics for words in the MC3 news article corpus
that was also stored in the MySQL database for ease of access.
A simple python script was used to
transfer the MC3 data and the University of Leeds analysis results, which were
also in simple text files, into the MySQL database.
Video:
ANSWERS:
MC 3.1 Potential Threats: Identify any imminent terrorist threats in the
Vastopolis metropolitan area. Provide detailed
information on the threat or threats (e.g. who, what, where, when, and how) so
that officials can conduct counterintelligence activities. Also, provide a list
of the evidential documents supporting your answer.
There were many interesting activities in Vastopolis but most could not be considered “imminent terrorist”
threats. However, what may pose an imminent threat involves stolen equipment
from the labs of molecular biologist Professor Edward Patino.
Prof. Patino has been harassed by the group Citizens
for Ethical Treatment of Lab Mice, who in-turn are affiliated with the Forever Brotherhood
of Antarctica. The Professor himself has recently given lectures on the threat
of bioterrorism, in addition, the Center for Disease
Control (CDC) also released a recent report highlighting the dangers of
bioterrorism. Since the robbery of the professor’s lab, the Brotherhood and the
Citizens for Ethical Treatment of Lab Mice have shown an increase level of
activity. Lastly, dead fish has turned-up in Vast River. Therefore, we believe
that there may be an imminent threat to Vastopolis
metropolitan area from Forever Brotherhood of Antarctica and their affiliates,
the Citizens for Ethical Threatment of Lab Mice
involving some form of biological weapon created from the equipment stolen from
Professor Patino’s lab.
Table 1: Timeline of News
Articles
Date of article |
Event |
11-04-2011 |
Prof. Patino
gives lecture on bioterrorism |
18-04-2011 |
CDC releases publication on
threats of bioterrorism |
26-04-2011 |
Prof Patino’s
lab gets robbed |
02-05-2011 |
Mayor’s dog gets kidnapped |
03-05-2011 |
Basketball teams mascot
goes missing from Vastopolis Dome |
09-05-2011 |
Citizens for Ethical Threatment of Lab Mice send threatening emails to Vast
Press |
19-05-2011 |
Dead fish is found is Vast
River |
The other events in Vastopolis,
which were discounted as either being resolved or self-contained, include,
1. Military weapons went missing from Vastopolis
Armed Forces on the 26-04-2011 and on the 30-04-2011, military grade weapons
were used in a park shootout in Southville. However,
the weapons were recovered at the Vastopolis airport
on the 20-05-2011.
2. Two mental patients affiliated with the psychobrotherhood
escaped the Vastopolis Center for the Criminally
Insane on 27-04-2011 but were caught on the 12-05-2011 while trying to make a
bomb. No further information was available in the corpus for psychobrotherhood.
3. An Antarctica Airlines plane crashed and traces of explosives were found
in the wreckage but this is a past event. In addition, while there were
articles about bad security at Vastopolis Airport,
following the crash – security was increased.
4. A 60 year old man built an improvised explosive device to kill his neighbor’s
cat KeeKee but that was a self-contain incident.
5. A man with a bomb concealed in the turkey was stopped at Vastopolis Airport but that news article did not provide
any hooks for further investigation.
6. The daughter of a military counter-intelligence agent was raped by
another soldier and her identity exposed but the article provided no course to
follow.
7. F-Alliance a group of Hackers comprised of high-school drop-out were
arrested, thus another resolved issue.
8. Anarchists for Freedom issue daily threats to Vastopolis Officials but there is no evidence they actually
do more then bark.
9. Lastly, Vastopolis was included in general
threat issued by the overseas terror group Network of Dread.
We used the INteractive VIsual
Search and QUery Environment (INVISQUE), a prototype
visual analytics interface created at Middlesex University, to visually sift
through the news corpus. INVISQUE uses index-card visualization to represent
individual information items, in this case the news articles, and arranges them
on screen on an X-Y axis. Figure 1 shows the search results from the keyword
search “bomb” arranged on the X axis by significance and on the Y axis by date
- so that news articles with higher level of significance for the keyword
“bomb” appears more to the left and newer articles appear higher up the Y axis.
Figure 1: INVISQUE index-card
visualization arranged on X-Y axis
The “significance” value was calculated by collaboration members from the University of Leeds who performed keyword extraction on the news corpus. Keyword Extraction is a standard Corpus Linguistics technique for genre classification which pinpoints statistically significant or "key" words for that genre via comparison with a general reference corpus. The significance calculation for the MC 3 corpus entailed comparison of word frequency distributions in each of the 4474 news article test sets with their distribution in the entire news article dataset as reference corpus. The Leeds program verifies apparent overuse of lexical items in each article by computing the difference between these observed frequencies and the norm as represented by their expected frequency in the whole dataset, expressed as a log likelihood (LL) statistic. Words with LL scores of 6.63 or above are statistically significant.
Single word searches, e.g. bioterrorism, on the INVISQUE interface is applied against the word list generated by University of Leeds, see Figure 2, and leads to generation of index-cards that have the matching keyword as the title and the significance of the keyword as the top-left value. Composite phrases, e.g. “Vast River”, are applied against the full article text and in the latter case – the title of the index card becomes the most significant keyword, as calculated by Leeds, in the article. These are illustrated in Figure 3.
Figure 2: Table of Words in MySQL
containing results of University of Leeds Keyword Extraction
Figure 3: Searching Using INVISQUE
As shown in
Figure 4, the cards also show a “gist” of the article by displaying the top
three most significant keywords of the article, the article title, Vastopolis locations mentioned in the article, which are
extracted and appended to the cards by the middleware based on a pre-compiled list,
and the date of the article. The cluster of returned cards can be filtered by any
of the card features and the cards also have a shortcut to the full text of the
article.
Figure
4: Index Card Fields
As demonstrated
in the accompanying video, the primary technique used to explore the corpus
provided was visual searching and filtering. This technique allowed us to
explore the corpus very thoroughly, very quickly and we began to get a good
picture of the happening in Vastopolis within hours
of beginning our exploration.